NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Kamoi, Ryo; Zhang, Yusen; Das, Sarkar_Snigdha_Sarathi; Zhang, Ranran_Haoran; Zhang, Rui (October 2025, Second Conference on Language Modeling)

Free, publicly-accessible full text available October 7, 2026
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

Zhang, Yusen; Zheng, Wenliang; Madasu, Aashrith; Shi, Peng; Kamoi, Ryo; Zhou, Hao; Zou, Zhuoyang; Zhao, Shu; Das, Sarkar_Snigdha_Sarathi; Gupta, Vipul; et al (October 2025, International Conference on Computer Vision)

Free, publicly-accessible full text available October 19, 2026
GReaTer: Gradients Over Reasoning Makes Smaller Language Models Strong Prompt Optimizers

Das, Sarkar_Snigdha_Sarathi; Kamoi, Ryo; Pang, Bo; Zhang, Yusen; Xiong, Caiming; Zhang, Rui (April 2025, The International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Shortcomings of Question Answering Based Factuality Frameworks for Error Localization

Kamoi, Ryo; Goyal, Tanya; Durrett, Greg (January 2023, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics)

Despite recent progress in abstractive summarization, models often generate summaries with factual errors. Numerous approaches to detect these errors have been proposed, the most popular of which are question answering (QA)-based factuality metrics. These have been shown to work well at predicting summary-level factuality and have potential to localize errors within summaries, but this latter capability has not been systematically evaluated in past research. In this paper, we conduct the first such analysis and find that, contrary to our expectations, QA-based frameworks fail to correctly identify error spans in generated summaries and are outperformed by trivial exact match baselines. Our analysis reveals a major reason for such poor localization: questions generated by the QG module often inherit errors from non-factual summaries which are then propagated further into downstream modules. Moreover, even human-in-the-loop question generation cannot easily offset these problems. Our experiments conclusively show that there exist fundamental issues with localization using the QA framework which cannot be fixed solely by stronger QA and QG models.
more » « less
Full Text Available
WiCE: Real-World Entailment for Claims in Wikipedia

https://doi.org/10.18653/v1/2023.emnlp-main.470

Kamoi, Ryo; Goyal, Tanya; Rodriguez, Juan; Durrett, Greg (January 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)

Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation. However, these represent a significant domain shift from existing entailment datasets, and models underperform as a result. We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim, and a minimal subset of evidence sentences that support each subclaim. To support this, we propose an automatic claim decomposition strategy using GPT-3.5 which we show is also effective at improving entailment models’ performance on multiple datasets at test time. Finally, we show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
more » « less
Full Text Available

Search for: All records